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ORIGINAL ARTICLE 

Predicting the diagnosis of autism spectmm disorder using 
gene pathway analysis 

E Skafidas\ R Testa^ ^ D Zantomio^ G Chana^ IP Everall^ and C Pantelis^'^ 

Autism spectrum disorder (ASD) depends on a clinical interview with no biomarkers to aid diagnosis. The current investigation 
interrogated single-nucleotide polymorphisms (SNPs) of individuals with ASD from the Autism Genetic Resource Exchange (AGRE) 
database. SNPs were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG)-derived pathways to identify affected cellular 
processes and develop a diagnostic test. This test was then applied to two independent samples from the Simons Foundation 
Autism Research Initiative (SFARI) and Wellcome Trust 1958 normal birth cohort (WTBC) for validation. Using AGRE SNP data from a 
Central European (CEU) cohort, we created a genetic diagnostic classifier consisting of 237 SNPs in 146 genes that correctly 
predicted ASD diagnosis in 85.6% of CEU cases. This classifier also predicted 84.3% of cases in an ethnically related Tuscan cohort; 
however, prediction was less accurate (56.4%) in a genetically dissimilar Han Chinese cohort (HAN). Eight SNPs in three genes 
{KCNMB4, GNAOl, GRM5) had the largest effect in the classifier with some acting as vulnerability SNPs, whereas others were 
protective. Prediction accuracy diminished as the number of SNPs analyzed in the model was decreased. Our diagnostic classifier 
correctly predicted ASD diagnosis with an accuracy of 71.7% in CEU individuals from the SFARI (ASD) and WTBC (controls) validation 
data sets. In conclusion, we have developed an accurate diagnostic test for a genetically homogeneous group to aid in early 
detection of ASD. While SNPs differ across ethnic groups, our pathway approach identified cellular processes common to ASD 
across ethnicities. Our results have wide implications for detection, intervention and prevention of ASD. 
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INTRODUCTION 

Autism spectrum disorders (ASDs) are a complex group of sporadic 
and familial developmental disorders affecting 1 in 150 births^ and 
characterized by: abnormal social interaction, impaired 
communication and stereotypic behaviors.^ The etiology of ASD 
is poorly understood, however, a genetic basis is evidenced by the 
greater than 70% concordance in monozygotic twins and elevated 
risk in siblings compared with the population.^"^ The search for 
genetic loci in ASD, including linkage and genome-wide 
association screens (GWAS), has identified a number of candidate 
genes and loci on almost every chromosome,^"" with multiple 
hotspots on several chromosomes (for example, CNTNAP2, NGLNX4, 
NRXNh IMMP2L, D0CK4, SEMASA, 5YNGAP1, DLGAP2, SHANK2 and 
SHANK3)/'^^~^^ and copy number variations.^'^^'^^"^^ However, 
none of these have provided adequate specificity or accuracy to 
be used in ASD diagnosis. Novel approaches are required^^ to 
examine multiple genetic variants and their additive contri- 
bution ^^'^^'^"^ taking into account genetic differences between 
ethnicities and consideration of protective versus vulnerability 
single-nucleotide polymorphisms (SNPs). 

The present study interrogated the Autism Genetics Resource 
Exchange (AGRE)^^ SNP data with two aims: (1) to identify groups 
of SNPs that populate known cellular pathways that may be 
pathogenic or protective for ASD, and (2) to apply machine 
learning to identified SNPs to generate a predictive classifier for 
ASD diagnosis.^^ The results were validated in two independent 
samples: the US Simons Foundation Autism Research Initiative 



(SFARI) and UK Wellcome Trust 1958 normal birth cohort (WTBC). 
This novel and strategic approach assessed the contribution of 
various SNPs within an additive SNP-based predictive test for ASD. 

MATERIALS AND METHODS 

The University of Melbourne Human Research Ethics Committee approved 
the study (Approval Numbers 0932503.1, 0932503.2). 

Subjects 

(i) Index sample: subject data from 2609 probands with ASD (including 
Autism, Asperger's or Pervasive Developmental Disorder-not otherwise 
specified, but excluding RETT syndrome and Fragile X), and 4165 relatives 
of probands, was available from AGRE (http://www.agre.org); 1862 
probands and 2587 first-degree relatives had SNP data from the lllumina 
550 platform relevant to analyses (Figure la). Diagnosis of ASD was made 
by a specialist clinician and confirmed using the Autism Diagnostic 
Interview Revised (ADI-R^^). Control training data was obtained from 
HapMap^^ instead of relatives, as the latter may possess SNPs that 
predispose to ASD and skew analysis (Figures la and b). 

(ii) Independent validation samples: 737 probands with ASD (ADI-R diag- 
nosed) derived from SFARI; 2930 control subjects from WTBC (Figure 1 b). 

As SNP incidence rates vary according to ancestral heritage, HapMap 
data (Phase 3 NCBI build 36) was utilized to allocate individuals to their 
closest ethnicity. Individuals of mixed ethnicity were excluded; HapMap 
data has 1 403 896 SNPs available from 11 ethnicities. Any SNPs not 
included in the AGRE data measured on the lllumina 550 platform were 
discarded, resulting in 407 420 SNPs. Mitochondrial SNPs reported in AGRE, 
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Figure 1. (a and b) Flow charts show the subjects used in the analyses. Key: AGRE, Autisnn Genetic Research Exchange; SFARI, Sinnons 
Foundation Autisnn Research Initiative; WTBC, Wellconne Trust 1958 nornnal birth cohort; CEU, of Central (Western and Northern) European 
origin; HAN, of Han Chinese origin; TSI, of Tuscan Italian origin; For panels la and b: 'red boxes' — sannples used in developing the predictive 
algorithnn; 'blue boxes' — sannples used to investigate different ethnic groups; 'green boxes' — validation sets; 'light green boxes' — relatives 
assessed, including parents and unaffected siblings. Nunnbers in brackets represent nunnbers of nnales/fennales. 



but not available in HapMap were excluded. The 30 most prevalent 
(>95%) SNPs within each ethnicity were identified and each ASD 
individual assigned to the group for which they shared the highest 
number of ethnically specific SNPs. HapMap groups were determined to be 
appropriate for analysis, as prevalence rates of the 30 SNPs relevant to 
each ethnicity were similar for each AGRE group assigned to that ethnicity, 
P<0.05. 



Gene set enrichnnent analysis (GSEA) 

Pathway analysis was selected because it depicts how groups of genes 
may contribute to ASD etiology (Supplementary SI) and mitigates the 



statistical problem of conducting a large number of multiple comparisons 
required in GWAS studies. The current pathway analysis differs from 
previous ASD analyses in three unique ways: (1) we divided the cohort into 
ethnically homogeneous samples with similar SNP rates; (2) both 
protective and contributory SNPs were accounted for in the analysis and 
(3) the pathway test statistic was calculated using permutation analysis. 
Although this is computationally expensive, benefits include taking 
account of rare alleles, small sample sizes and familial effects. It also 
relaxes the Hardy-Weinberg equilibrium assumption, that allele and 
genotype frequencies remain constant within a population over genera- 
tions. Pathways were obtained from the Kyoto Encyclopedia of Genes and 
Genomes (KEGG) and SNP-to-gene data obtained from the National Center 



© 2014 Macmillan Publishers Limited 



Molecular Psychiatry (2014), 504-510 



506 



Predicting the diagnosis of autism spectrum disorder 

E Skafidas et al 



for Biotechnology Information (NCBI). Intronic and exonic SNPs were 
included. AGRE individuals most closely matching the genetics of Utah 
residents of Western and Northern European (CEU), Tuscan Italian (TSI) and 
Han Chinese origin were used in the analysis. CEU individuals (975 affected 
individuals and 165 controls) were chosen as the index sample, 
representing the largest group affected in AGRE (Figure la). The CEU 
and Han Chinese had 1 1 6 753 SNPs that differed, whereas the CEU and TSI 
had 627 SNPs, differing in allelic prevalence at P<1 x 10~^. The pathway 
test statistic was calculated for CEU and Han individuals using a 'set-based 
test' in the PLINK^^ software package, with P = 0.05, /^ = 0.5 and 
permutations set to at least 2 000000. Significance threshold was set 



Cumulative Coefficient Error and Classification Error vs P-value 
6 I I 1 1 I 1 1 I 1 40 



conservatively at P<1 ) 
being examined (200). 
<1 X 10~^ (see Supplementary SI) 



10 , calculated from the number of pathways 
Therefore, significance was < 0.05/200, set at 



Predicting ASD phenotype based upon candidate SNPs 
For each individual, a 775-dimensional vector was constructed, corre- 
sponding to 775 unique SNPs identified as part of the GSEA. To examine 
whether SNPs could predict an individual's clinical status (ASD versus non- 
ASD), two-tail unpaired f-tests were used to identify which of the 775 SNPs 
had statistically significant differences in mean SNP value (P< 0.005). This 
significance level provided low classification error while maintaining 
acceptable variance in estimation of regression coefficients for each 
SNP's contribution status, and provided the set of SNPs that maximized the 
classifier output between the populations (Figure 2 and Supplementary 
S2). This resulted in 237 SNPs selected for regression analysis. Each 
dimension of the vector was assigned a value of 0, 1 or 3, dependent 
on a SNP having two copies of the dominant allele, heterozygous or two 
copies of the minor allele. The '0, 1, 3' weighting provided greater 
classification accuracy over '0, 1, 2'. Such approaches using superadditive 
models have been used previously to understand genetic interactions.^^ 
The formula for the classifier and classifier performance are presented 
in Supplementary S3. 

The CEU sample was divided into a training set (732 ASD individuals and 
123 controls) and the remainder comprised the validation set. An affected 
individual was given a value of 10 and an unaffected individual a value of 
-10, providing a sufficiently large separation to maximize the distance 
between means (see Supplementary S3). Least squares regression analysis 
of the training set determined coefficients whose sum over product by 
SNP value mapped SNPs to clinical status. Kolmogorov-Smirnov goodness 
of fit test assessed the nature of distribution of SNPs by classification. At 
P = 0.05, the distributions were accepted as being normally distributed, 
allowing determination of positive and negative predictive values (see 
ROC, Supplementary S4). The Durbin-Watson test was used to investigate 
the residual errors of the training set to determine if further correlations 
existed. At P = 0.05, the residuals were uncorrelated. Regression coeffi- 
cients were used to assess individual SNP contribution to clinical status. 



AGRE validation 

After analyzing the CEU training cohort, three cohorts were used for 
validation: 285 (243 probands, 42 controls) CEUs; a genetically similar TSI 
sample (65 patients, 88 controls); and a genetically dissimilar Han Chinese 
population (33 patients, 169 controls). To illustrate overlap in SNPs in first- 
degree relatives of individuals with ASD (n = 1512), we mapped the SNPs 
of parents (n = 1219; 581 male) and unaffected siblings (n = 293; 98 male) 
of CEU origin who did not meet criteria for ASD. Finally, the accuracy of the 
predictive model was modified to test predictive ability using 1 0, 30 and 60 
SNPs having the greatest weightings. 

Independent validation 

Samples included 507 CEU and 18 TSI subjects with ASD from SFARI, and 
2557 CEU and 63 TSI from WTBC (Figure lb). 



RESULTS 

Identification of affected pathways 

Analyses focused on 975 CEU ASD individuals, in which 13 KEGG 
pathways were significantly affected (P<1 x 10~^). The pathway 
analysis identified 775 significant SNPs perturbed in ASD. A 
number of the pathways were populated by the same genes and 
had inter-related functions (TalDle 1). 




4 5 
P-value (x 10"^) 

Figure 2. Cumulative coefficient estimation error and percentage 
classification error as a function of P-value; P = 0.005 provides good 
trade-off between classification performance and cumulative 
regression coefficient error. 



The most significant pathways were: calcium signaling, gap 
junction, long-term depression (LTD), long-term potentiation 
(LTP), olfactory transduction and mitogen-activated kinase-like 
protein signaling. GSEA on the genetically distinct Han Chinese 
identified six pathways that overlapped with 13 pathways in the 
CEU cohort (estimate of this occurring by chance, P = 0.05), 
including: purine metabolism, calcium signaling, phosphatidylino- 
sitol signaling, gap junction, long-term potentiation and long-term 
depression. Related to these pathways, the statistically significant 
SNPs in both populations were rs3790095 within GNAOI, rsl 869901 
within PLCB2, rs6806529 within ADCY5 and rs9313203 in ADCY2. 



Diagnostic prediction of ASD 

From the 775 SNPs identified within the CEU cohort, accurate 
genetic classification of ASD versus non-ASD was possible using 
237 SNPs determined to be highly significant (P< 0.005). Figure 3a 
shows the distribution of ASD and non-ASD individuals based on 
genetic classification. An individual's clinical status was set to ASD 
if their score exceeded the threshold of 3.93. This threshold 
corresponds to the intersection points of the two normal curves. 
The theoretical classification error was 8.55%, and positive (ASD) 
and negative predictive values (controls) were 96.72% and 
94.74%, respectively. Classification accuracy for the 285 CEU AGRE 
validation individuals was 85.6% and 84.3% for the TSI, while 
accuracy for the Han Chinese population was only 56.4%. Using the 
same classifier with the identical set of SNPs, accuracy of prediction 
of ASD in the independent data sets was 71.6%; positive and 
negative predictive accuracies were 70.8% and 71.8%, respectively. 

SNPs were compared with the affected and unaffected 
individuals. Figure 3b shows that relatives (parents and unaffected 
siblings combined) fall between the two distributions, with a 
mean score of 2.68 (s.d. = 2.27). The percentage overlap of the 
relatives and affected individuals was 30.4%. The mean scores of 
the mothers and fathers did not differ (at P = 0.05) with scores of 
2.83 (s.d. = 2.17) and 2.93 (s.d. = 2.34), respectively (see 
Supplementary S5), whereas unaffected siblings (not meeting 
diagnostic criteria for ASD) fell between parents and cases 
(mean = 4.74, s.d. = 3.80). In testing the robustness of the 
predictive model, using fewer SNPs monotonically decreased 
accuracy in the AGRE-CEU analyses to 72% for 60 SNPs, 58% for 30 
SNPs and 53.5% for 1 0 SNPs, with the distribution of parents being 
indistinguishable from controls. 
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Table 1. Statistically significant pathways for the CEU and Han Chinese 



KEGG pathway 


Pathway name 


CEU significance (P -values) 


l-IAN significance (P-values) 


hsa04020 


Calcium signaling 


5,0 X 10 


5,0 X 10 


hsa04540 


Gap junction 


5.0 X 10~ ^ 


5,0 X 10~^ 


hsa04730 


Long-term depression 


5,0 X 10 ~^ 


5,0 X 10 ~^ 


hsa04070 


Phosphotidylinositol signaling 


1,5 X 10~^ 


5,0 X 10 ~^ 


hsa04720 


Long-term potentiation 


2,5 X 10~ ^ 


5,0 X 10~^ 


hsa00230 


Purine metabolism 


1 ,0 X 1 0 ~ ^ 


5,0 X 10 ~^ 


hsa04010 


mitogen-activated kinase-like protein 


5,0 X 10 ~^ 




nsa04740 


Olfactory transduction 


5,0 X 10~ ^ 




hsa04910 


Insulin signaling pathway 


1,5 X 10 ~^ 




hsa04916 


Melanogenesis 


2,0 X 10 ~^ 




hsa04310 


Wnt signaling 


4,0 X 1 0 ~ ^ 




hsa04912 


GnRH signaling 


4,5 X 10 ~^ 




nsa04i zO 


Ubiquitin-mediated proteolysis 


7,0 X 10 ~ ^ 




hsa04080 


Neuroactive ligand receptor 


1.2 X 10~^ 


5,0 X 10 


hsa04062 


Chemokine signaling pathway 


1.2 X 10~^ 


5,0 X 10 ~^ 


hsa04060 


Cytokine-cytokine receptor 


1.65 X 10"^ 


5,0 X 10 ^ 


hsa04114 


Oocyte meiosis 




5,0 X 10 ^ 


hsa04360 


Axon guidance 




5,0 X 10 ^ 


hsa04510 


Focal adhesion 




5,0 X 10 ^ 


hsa04514 


Cell adhesion molecules 




5,0 X 10 ^ 


hsa04670 


Leukocyte transendothelial migration 




5,0 X 10 ^ 


hsa04144 


Endocytosis 




2,0 X 10 ^ 


hsa04742 


Taste transduction 




2,0 X 10 ^ 



Abbreviations: CEU, of Central (Western and Northern) European origin; HAN, of Han Chinese origin; KEGG, Kyoto Encyclopedia of Genes and Genomes 
(ftp.kegg.jp). 

P-values in bold are statistically significant. The pathways highlighted in 'bold' denote pathways that have reached statistical significance in both populations. 



ASD and Control Groups ASD, Relatives, Siblings and Control Groups 




Autism Classifier Score Autism Classifier Score 

Figure 3. (a) Genetic-based classification of CEU population (AGRE and Controls) for ASD and non-ASD individuals, showing Gaussian 
approxinnation of distribution of individuals. As both the nnapped ASD and control populations were well approxinnated by nornnal 
distributions, the asynnptotic Test Positive Predictive Value (PPV) and Negative Predictive Value (NPV) was deternnined. For individuals with 
CEU ancestry, the PPV and NPV were 96.72% and 94.74%, respectively. (Note the test was substantially less predictive on individuals with 
different ancestry, that is, Han Chinese), (b) Genetic-based classification of CEU population, including first-degree relatives (parents and 
siblings of ASD children). Note that the distribution of relatives of ASD children nnaps between the ASD and the control groups, with no 
difference found between nnothers and fathers (see Supplennentary nnaterial S5). Key: ASD, autisnn spectrunn disorder; relatives, first-degree 
relatives (parents and siblings); Siblings, siblings of ASD cases not nneeting criteria for ASD; Autisnn Classifier Score, scores for each individual 
derived fronn the predictive algorithnn, with greater values representing greater risk for autisnn. 



Of the 237 SNPs within our classifier, presence of some 
contributed to vulnerability to ASD (Table 2a), whereas others 
were protective (Table 2b). Eight SNPs in three genes, GRM5, 
GNAOl and KCNMB4, were highly discriminatory in determining an 
individual's classification as ASD or non-ASD. For KCNMB4, 
rs968122 highly contributed to a clinical diagnosis of ASD, 
whereas rsl 23 17962 was protective; for GNAOl, SNP rs876619 
contributed, whereas rs8053370 was protective; for GRM5, SNPs 
rsl 1020772 was contributory, whereas rs905646 and rs6483362 
were protective. 



DISCUSSION 

Using pathway analysis, we have generated a genetic diagnostic 
classifier based on a linear function of 237 SNPs that accurately 
distinguished ASD from controls within a CEU cohort. This same 
diagnostic classifier was able to correctly predict and identify ASD 
individuals with accuracy exceeding 85.6% and 84.3% in the 
unseen CEU and TSI cohorts, respectively. Our classifier was then 
able to predict ASD group membership in subjects derived from 
two independent data sets with an accuracy of 71 .6%, thus greatly 
adding strength to our original finding. However, the classifier was 
sub-optimal at predicting ASD in the genetically distinct Han 
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Table 2. List of 15 most contributory (Table 2a) and 15 most protective (Table 2b) SNPs for ASD diagnosis in the CEU Cohort 


5NP 


Weight lower (0.95) 


Weight 


Weight higher (0.95) 


delta 


Gene number 


Gene symbol 


(a) Risk SNPs and their weigli tings 












rs9681 22 


1 .5465 


1.5555 


1.5645 


0.0090 


27 345 


KCNMB4 


rs87DDl9 


0.9476 


1.2092 


1.4708 


0.2616 


2775 


GNAOl 


rsl 1 020772 


0.8553 


0.8641 


0.8729 


0.0088 


2915 


GRM5 


rs9288685 


0.5856 


0.5998 


0.6140 


0.0142 


3635 


INPP5D 


rs10193128 


0.5836 


0.5946 


0.6056 


0.0110 


3635 


INPP5D 


rs7842798 


0.5298 


0.5386 


0.5474 


0.0088 


114 


ADCY8 


rs3773540 


0.5125 


0.5208 


0.5291 


0.0083 


55 799 


CACNA2D3 


rsl 81 81 06 


0.5002 


0.5161 


0.5320 


0.01 59 


80310 


PDGFD 


rs2384061 


0.4195 


0.4306 


0.441 7 


0.01 1 1 


109 


ADCY3 


rsl 2582971 


0.3983 


0.4295 


0.4607 


0.0312 


5288 


PIK3C2G 


rsl 0409541 


0.4067 


0.4189 


0.4311 


0.0122 


773 


CACNA1A 


rs2300497 


0.3782 


0.3889 


0.3996 


0.0107 


801 


CALM! 


rs7562445 


0.3741 


0.3843 


0.3945 


0.0102 


2066 


ERBB4 


rs73 13997 


0.3382 


0.3567 


0.3752 


0.0185 


5801 


PTPRR 


rs2239118 


0.3348 


0.3552 


0.3756 


0.0204 


775 


CACNAIC 


(b) Protective SNPs and their weightings 












rsl 7629494 


- 0.5242 


- 0.5070 


- 0.4898 


0.01 72 


5592 


PR KG! 


rs4648135 


- 0.5807 


- 0.5260 


-0.4713 


0.0547 


4790 


NFKBl 


rsl 7643974 


- 0.5527 


- 0.5424 


-0.5321 


0.0103 


1488 


CTBP2 


rsl 243679 


-0.5771 


- 0.5674 


-0.5577 


0.0097 


341 799 


0R6S1 


rs2240228 


- 0.5942 


-0.5816 


- 0.5690 


0.0126 


26532 


OR! OH 3 


rs260808 


- 0.5938 


- 0.5836 


- 0.5734 


0.0102 


80310 


PDGFD 


rs41 28941 


-0.6166 


- 0.6082 


- 0.5998 


0.0084 


8313 


AXIN2 


rs769052 


-0.6321 


- 0.6235 


-0.6149 


0.0086 


7322 


UBE2D2 


rs984371 


- 0.7273 


-0.7181 


- 0.7089 


0.0092 


219437 


0R5L1 


rs4308342 


-1.0196 


- 0.8938 


- 0.7680 


0.1258 


1633 


DCK 


rsl 1145506 


- 0.9400 


- 0.91 72 


- 0.8944 


0.0228 


9630 


GNA14 


rs905646 


- 0.9700 


- 0.9624 


- 0.9548 


0.0076 


2915 


GRM5 


rs6483362 


- 0.9894 


- 0.9661 


- 0.9428 


0.0233 


2915 


GRM5 


rsl 231 7962 


-1.4869 


- 1 .3200 


-1.1531 


0.1669 


27 345 


KCNMB4 


rs8053370 


-1.7162 


-1.6956 


-1.6750 


0.0206 


2775 


GNAOl 


Abbreviations: ASD, Autism spectrum disorder; CEU, of Central (Western and Northern) European origin; SNP, single-nucleotide polymorphism. 


Weight indicates the contribution of each SNP to ASD clinical status. 'Weight lower' indicates the 0.95 lower error bar of the estimate; 'Weight higher' indicates 


the 0.95 upper error bar for that SNP Note that some genes have SNPs that contribute to risk for ASD and SNPs that protect against ASD. 





Chinese cohort, which may be explained by differences in allelic 
prevalence. Although only 627 SNPs significantly differed between 
the TSI and CEU cohorts, this figure increased to 116 753 SNPs 
between the CEU and Han Chinese. It is likely that an additional 
set of SNPs may be predictive of ASD diagnosis in Han Chinese 
and that methods used for our classifier could be applicable to 
other ethnicities. Interestingly, parents and siblings of ASD-CEU 
individuals fell as distinct groups between the ASD and controls, 
reinforcing a genetic basis for ASD with neurobehavioral 
abnormalities reported in parents of ASD individuals also 
supporting our findings.^^ When we altered the classifier by 
reducing the number of SNPs, not only did the predictive accuracy 
suffer but also the relatives merged into the control group. This 
suggests that use of relatives as controls in SNP GWAS studies is 
only valid when examining small numbers of SNPs and may not 
be appropriate when assessing genetic interactions. 

There was considerable overlap in the pathways implicated in 
both the CEU and Han Chinese populations. The analysis 
demonstrated that SNPs in the Wnt signaling pathway contributed 
to a diagnosis of ASD in the CEU cohort, but not in the Han 
Chinese population. Although of interest, a firm conclusion 
regarding these differences and similarities will require replication 
in a larger Han Chinese population. Completion of diagnostic 
classification studies for other ethnic groups will invariably aid in 
identification of common pathological mechanisms for ASD. 

The SNPs contributing most to diagnosis in our classifier 
corresponded to genes for KCNMB4, GNAOl, GRM5, INPP5D and 
ADCY8. The three SNPs that markedly skewed an individual 
towards ASD were related to the genes coding for KCNMB4, 



GNAOl and GRM5. Homozygosity for KCNMB4 SNP carries a higher 
risk of ASD than SNPs related to GNAOl and GRM5. By contrast, a 
number of SNPs protected against ASD, including rs8053370 
{GNAOl), rsl 231 7962 {KCNMB4), rs6483362 and rs905646 {GRM5). 
KCNMB4 is a potassium channel that is important in neuronal 
excitability and has been implicated in epilepsy and dyskine- 
sia.^^'^^ It is highly expressed within the fusiform gyrus, as well as 
in superior temporal, cingulate and orbitofrontal regions (Allen 
Human Brain Atlas, http://human.brain-map.org/), which are areas 
implicated in face identification and emotion face processing 
deficits seen in ASD.^"^ GNAOl protein is a subgroup of Ga(o), a 
G-protein that couples with many neurotransmitter receptors. 
Ga(o) knockout mice exhibit 'autism-like' features, including 
impaired social interaction, poor motor skills, anxiety and 
stereotypic turning behavior.^^ GNAOl has also been shown to 
have a role in nervous development co-localizing with GRIN1 at 
neuronal dendrites and synapses,^^ and interacting with GAP-43 
at neuronal growth cones,^^ with increased levels of GAP-43 
demonstrated in the white matter adjacent to the anterior 
cingulate cortex in brains from ASD patients.^^ 

In our findings, GRM5 SNPs have both a contributory 
(rsl 1020772) and protective (rs905646, rs6483362) effect on 
ASD. GRM5 is highly expressed in hippocampus, inferior temporal 
gyrus, inferior frontal gyrus and putamen (Allen Human Brain 
Atlas), regions implicated in ASD brain MRI studies.^^ GRM5 has a 
role in synaptic plasticity, modulation of synaptic excitation, innate 
immune function and microglial activation.^°"^^ G/?M5-positive 
allosteric modulators can reverse the negative behavioral effects 
of NMDA receptor antagonists, including stereotypies, sensory 
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motor gating deficits and deficits in working, spatial and 
recognition memory,^'^ features described in ASD."^^'"^^ With 
regard to GRMS's involvement with neuroimmune function, this 
receptor is expressed on microglia,'^^''^^ with microglial activation 
demonstrated by us and others in frontal cortex in ASD.^^'^^ 

Further, as GRIV15 signaling is mediated via signaling through 
Gene Protein Couple Receptors, a possible interaction between 
GNA01 and GRIV15 is plausible. Genes such as PLCB2, ADCY2, 
ADCY5 and ADCY8 encode for proteins involved in G-protein 
signaling. Given this association, GRM5 may represent a pivotal 
etiological target for ASD; however, further work is needed in 
demonstrating these potential interactions and contribution to 
glutamatergic dysregulation in ASD. 

In conclusion, within genetically homogeneous populations, our 
predictive genetic classifier obtained a high level of diagnostic 
accuracy. This demonstrates that genetic biomarkers can correctly 
classify ASD from non-ASD individuals. Further, our approach of 
identifying groups of SNPs that populate known KEGG pathways 
has identified potential cellular processes that are perturbed in 
ASD, which are common across ethnic groups. Finally, we 
identified a small number of genes with various SNPs of influential 
weighting that strongly determined whether a subject fell within 
the control or ASD group. Overall these findings indicate that a 
SNP-based test may allow for early identification of ASD. Further 
studies to validate the specificity and sensitivity of this model 
within other ethnic groups are required. A predictive classifier as 
described here may provide a tool for screening at birth or during 
infancy to provide an index of 'at-risk status', including probability 
estimates of ASD-likelihood. Identifying clinical and brain-based 
developmental trajectories within such a group would provide the 
opportunity to investigate potential psychological, social and/or 
pharmacological interventions to prevent or ameliorate the 
disorder. A similar approach has been adopted in psychosis 
research, which has improved our understanding of the disorder 
and prognosis for affected individuals.^^ 
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